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Abstract This work is the first part of a project dealing with an in-depth study of effective 
techniques used in econometrics in order to make accurate forecasts in the concrete framework 
of one of the major economies of the most productive Italian area, namely the province of 
Verona. In particular, we develop an approach mainly based on vector autoregressions, where 
lagged values of two or more variables are considered, Granger causality, and the stochastic 
trend approach useful to work with the cointegration phenomenon. Latter techniques constitute 
the core of the present paper, whereas in the second part of the project, we present how these 
approaches can be applied to economic data at our disposal in order to obtain concrete analysis 
of import-export behavior for the considered productive area of Verona. 

Keywords Econometrics time series, autoregressive models. Granger causality, 
cointegration, stochastic nonstationarity, AIC and BIG criteria, trends and breaks 

1 Introduction 

The analysis of time series data constitutes a key ingredient in econometric stud¬ 
ies. Last years have been characterized by an increasing interest toward the study of 
econometric time series. Although various types of regression analysis and related 
forecast methods are rather old, the worldwide financial crisis experienced by mar¬ 
kets starting from last months of 2007, and which is not yet finished, has put more 
attention on the subject. Moreover, analysis and forecast problems have become of 
great momentum even for medium and small enterprizes since their economic sus¬ 
tainability is strictly related to the propensity of a bank to give credits at reasonable 
conditions. 
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In particular, great efforts have been made to read economic data not as mon¬ 
ads, but rather as constituting pieces of a whole. Namely, new techniques have been 
developed to study interconnections and dependencies between different factors char¬ 
acterizing the economic history of a certain market, a given firm, a specified industrial 
area, and so on. From this point of view, methods such as the vector autoregression, 
the cointegration approach, and the copula techniques have been benefitted by new 
research impulses. 

A challenging problem is then to apply such instruments in concrete situations and 
the problem becomes even harder if we take into account the economies are hardly 
hit by the aforementioned crisis. A particularly important case study is constituted by 
a close analysis of import-export time series. In fact, such an information, spanning 
from countries to small firms, has the characteristic to provide highly interesting hints 
for people, for example, politicians or CEOs, to depict future economic scenarios and 
related investment plans for the markets in which they are involved. 

Exploiting precious economic data that the Commerce Chamber of Verona 
Province has put at our disposal, we successfully applied some of the relevant ap¬ 
proaches already cited to find dependencies between economic factor characterizing 
the Province economy and then to make effective forecasts, very close to the real 
behavior of studied markets. 

For completeness, we have split our project into two parts, namely the present 
one, which aims at giving a self-contained introduction to the statistical techniques of 
interest, and the second one, where the Verona import-export case study have been 
treated in detail. 

In what follows, we first recall univariate time series models, paying particular 
attention to the AR model, which relates a time series to its past values. We will 
explain how to make predictions, by using these models, how to choose the delays, 
for example, using the Akaike and Bayesian information crtiteria (AIC, resp. BIC), 
and how to behave in the presence of trends or structural breaks. Then we move to the 
vector autoregression (VAR) model, in which lagged values of two or more variables 
are used to forecast future values of these variables. Moreover, we present the Granger 
causality, and, in the last part, we return to the topic of stochastic trend introducing 
the phenomenon of cointegration. 

2 Univariate time-series models 

Univariate models have been widely used for short-run forecast (see, e.g., [6, Exam¬ 
ples of Chapter 2]. In what follows, we recall some of these techniques, focusing our¬ 
selves particularly on the analysis of autoregressive (AR) processes, moving average 
(MA) processes, and a combination of both types, the so-called ARMA processes; 
for further details, see, for example, [3, 2, 8] and references therein. 

The observation on the time-series variable Y made at date t is denoted by Yj, 
whereas T G N’*' indicates the total number of observations. Moreover, we denote 
the jth lag of a time series {Yt}t=o,...,T by Yt-j (the value of the variable Y j pe¬ 
riods ago); similarly, Yt+j denotes the value of Y j periods to the future, where, 
for any fixed t G {0 ,... ,T},j is such that j G N'*', t — j > 0, and t + j < T. 
The jth autocovariance of a series Yj is the covariance between Yj and its jth lag. 
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that is, autocovariancej = aj := cov(Yt, Yt-j), whereas the jth autocorrelation co¬ 
efficient is the correlation between Yt and Yt-j, thats is, autocorrelation j = pj := 
corr(Yt,Yt-i) = cov{Yt,Yt-j} when the average and variance of a variable 

•y/var(yt) var(Yt_3) 

are unknown, we can estimate them by taking a random sample of n observations. 
In a simple random sample, n objects are drawn at random from a population, and 
each object is equally likely to be drawn. The value of the random variable Y for 
the ith randomly drawn object is denoted Yt. Because each object is equally likely 
to be drawn and the distribution of Tj is the same for all i , the random variables 
Yi,... are independent and identically distributed (i.i.d.). Given a variable Y, 
we denote by Y its sample average with respect to the n observations Yi,..., Y„, 
thats is, Y = ^(Yi + Y 2 + ■ ■ ■ + Yn) = ^ whereas we dehne the re¬ 
lated sample variance by Sy ■— ~ Y)^. The jth autocovariances, resp. 

autocorrelations, can be estimated by the jth sample autocovariances, resp. autocorre¬ 
lations, as follows: aj := ^ 'EtLj+i O^t - Yj+i^T)(Yt-j - Yi,T-j), resp. pj := 

where Yj+i ^ denotes the sample average of Yt computed over the observations 
f = j -I-1,..., T. Concerning forecast based on regression models that relates a time 
series variable to its past values, for completeness, we shall start with the hrst-order 
autoregressive process, namely the AR(1) model, which uses Yt_i to forecast Yt. A 
systematic way to forecast is to estimate an ordinary least squares (OLS) regression. 
The OLS estimator chooses the regression coefficients so that the estimated regres¬ 
sion line is as close as possible to the observed data, where the closeness is measured 
by the sum of the squared mistakes made in predicting Yt given Yt-i. Hence, the 
AR(1) model for the series Yt is given by 


Yt =/3o + PiYt-i + ut, ( 1 ) 

where fHo and /3i are the regression coefficients. In this case, the intercept /3o is the 
value of the regression line when Yt-i = 0, the slope Pi represents the change in 
Yt associated with a unit change in Yt_i, and ut denotes the error term whose nature 
will be later clarihed. Let us assume that the value Ytg of the time series Y* at initial 
time to is given; then Yt^+i = Po + PiYtg + Ut^+i, so that iterating relation (1) up to 
order r > 0 , we get 

^to+T = (1 + /3i + /^i H- + Pi ^)/3o + PiYto 

+ Pi ^Utg + t + Pi ‘^tltg+2 + • • • -f PlUto+T-1 + Uto+T 

= Pl"^to + Po + X/ Pi’^^to+r-j- 

i=o 

Hence, taking t = to + t with to = 0, we obtain 

Yt = PlYo + ^ Piut-j. (2) 


A time series Yt is called stationary if its probability distribution does not change 
over time, that is, if the joint distribution of (Y;,+i, Ys+ 2 ,..., Y^+t) does not depend 
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on s; otherwise, Yt is said to be nonstationary. In (2), the process Yt consists of both 
time-dependent deterministic and stochastic parts, and, thus, it cannot be stationary. 

Formally, the process with stochastic initial conditions results from (2) if and only 
if |/3i I < 1. It follows that if limtj,_>_oo F'to is bounded, then, as to —>■ —oo, we have 


Y, = 


/3o 


1-^1 




(3) 


i=o 


see, for example, [6, Chap. 2.1.1]. Equation (3) can be rewritten by means of the lag 
operator, which acts as follows: LYt = Yt-i, L^Yt = Yt- 2 , ■ ■ ■ jL'^Yt = Yt-k, so 
that Eq. (1) becomes (1 — fiiL)Yt = Pq + ut- Assuming that E\ut] = 0 for all t, we 
have 


E[Yt] = E 


= E 


/3o 


l-/3i 




j=o 


1-^1 






j=0 


-Pi 


= Ai, 


Yt- 


Po 

I-Pi 


2n 


= E 


2n 


L \j=0 


'{Ut-j 


— E[(ut + PiUt-l + P\ut -2 + • • • ) ] 

= E\u^ + + PiU^_2 + • • • + 2f3iUtUt-i + 20lutUt-2 + ■ ■ • ] 

= a\l+Pl+Pt + ...) = 


i-pr 


where we have used that E[utUs] = 0 for t ^ s and \pi\ < 1. Hence, both the mean 
and variance are constants, and thus the covariances are given by 


CoY[Yt,Yt_i]=E 


Yf- 


Po 


1-Pi 


Yt-i- 


Po 


I-Pi. 

= E^(ut + PiUt-i + • • • + PiUt-T + • • •) 

X {ut-T + PlUt-T-l + PlUt-T-2 + ■■■)] 

= E\(ut + PiUt-i -\- ■ ■ ■ + PI ^Ut-T-1 

+ Pi (ut-T + PlUt-T-1 + PiUt-r-2 + ■■■)) 

X (ut-T + PlUt-T-1 + Pl'U't-T-2 + ■■■)] 

= P1E\j^Ut-T + PiUt-T-1 + PiUt-T-2 + ■ • ■) ] = PiV\Yt_P\ 

The previous AR( 1) can be generalized by considering arbitrary but hnite order p > 1. 
In particular, an AR(p) process can be described by the equation 


Yt — /3o + PiYt—i + P 2 Yt —2 + • • • + PpYt—p + Ut, (4) 

where /3o,..., Pp are constants, whereas ut is the error term represented by a random 
variable with zero mean and variance > 0. Using the lag operator, we can rewrite 
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Eq. (4) as (1 — (3iL — P 2 L‘^ — ... — j3pU‘)Yt = /3o + Ut- In such a framework, it is 
standard to assume that the following four properties hold (see, e.g., [7, Chap. 14.4]): 

• Ut has conditional mean zero, given all the regressors, that is, 
E(ut\Yt-i^ Yt- 2 ,...) =0, which implies that the best forecast of Yt is given 
by the AR(p) regression. 

• Yi has a stationary distribution, and Yt, Yi_j are assumed to become inde¬ 
pendent as j gets large. If the time-series variables are nonstationary, then the 
forecast can be biased and inefficient, or conventional OLS-based statistical 
inferences can be misleading. 

• All the variables have nonzero finite fourth moments. 

• There is no perfect multicollinearity, namely it is not true that, given a certain 
regressor, it is a perfect linear function of the variables. 

2.7 Forecasts 

In this section, we show how the previously introduced class of models can be used 
to predict the future behavior of a certain quantity of interest. If Yt follows the AR(p) 
model and /7o, /3i,..., /3p are unknown, then the forecast of Yr+i is given by /3o -f 
PiYt + P 2 YT -1 + • • • -f fipYT-p+i- Forecasts must be based on estimates of the 
coefficients Pi by using the OLS estimators based on historical data. Let Yr+i denote 
the forecast of Yt+i based on Yt, Yt-i, ■ ■ ■'. 

Yt+i\t = / 3 o + PiVt + P2YT-1 + • • • + PpYr -p+i- 

Then such a forecast refers to some data beyond the data set used to estimate the 
regression, so that the data on the actual value of the forecasted dependent variable 
are not in the sample used to estimate the regression. Forecasts and forecast error 
pertain to “out-of-sample” observations. 

The forecast error is the mistake made by the forecast; this is the difference 
between the value of Tr+i that actually occurred and its forecasted value forecast 
eiTor := Yt+i - Yt+i\t- 

The root mean squared forecast error RMSFE is a measure of the size of the 
forecast error RMSFE = ^E[{Yt+i — Yt+i\tY], and it is characterized by two 
sources of error: the error arising because future values of Ut are unknown and the 
error in estimating the coefficients Pi. If the first source of error is much larger than 
the second, the RMSFE is approximately y/vePpjit), the standard deviation of the 
error ut, which is estimated by the standard error of regression (SER). One useful 
application used in time-series forecasting is to test whether the lags of one regressor 
have useful predictive content. The claim that a variable has no predictive content 
corresponds to the null hypothesis that the coefficients on all lags of that variable 
are zero. Such a hypothesis can be checked by the so-called Granger causality test 
(GCT), a type of F-statistic approach used to test joint hypothesis about regression 
coefficients. In particular, the GCT method tests the hypothesis that the coefficients 
of all the values of the variable in Yt — Po + PiYt-i + P 2 Yt -2 + ■ • • + PpYt-p + 
Ut, namely the coefficients of Yt-i,Yt- 2 , ■ ■ ■, Yt-p, are zero, and hence this null 
hypothesis implies that such regressors have no predictive content for Yt. 
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2.2 Lag length selection 

Let us recall relevant statistical methods used to optimally choose the number of 
lags in an autoregression model; in particular, we focus our attention on the Bayes 
method (BIC) and on the Akaike method (AIC); for more details, see, for example, 
[7, Chap. 14.5]. The BIC method is specified by 



(5) 


where SSR(p) is the sum of squared residuals of the estimated AR(p). The BIC es¬ 
timator of p is the value that minimizes BIC(p) among all the possible choices. In the 
first term of Eq. (5), the sum of squared residuals necessarily decreases when adding 
a lag. In contrast, the second term is the number of estimated regression coefficients 
times the factor (In T)/T, so this term increases when adding a lag. This implies that 
the BIC trades off these two aspects. The AIC approach is defined by 



and hence the main difference between the AIC and BIC is that the term ln(T) in 
the BIC is replaced by 2 in the AIC, so the second term in the AIC is smaller. But 
the second term in the AIC is not large enough to assure choosing the correct length, 
so this estimator of p is not consistent. We recall that an estimator is consistent if, as 
the size of the sample increases, its probability distribution concentrates at the value 
of the parameter to be estimated. So, the BIC estimator p of the lag length in an 
autoregression is correct in large samples, that is, Pr(p = p) ^ 1. This is not true 
for the AIC estimator, which can overestimate p even in large samples; for the proof, 
see, for example, [7, Appendix 14.5]. 

2.3 Trends 

A further relevant topic in econometric analysis is constituted by nonstationarities 
that are due to trends and breaks. A trend is a persistent long-term movement of 
a variable over time. A time-series variable fluctuates around its trend. There are 
two types of trends, deterministic and stochastic. A deterministic trend is a non- 
random function of time. In contrast, a stochastic trend is characterized by a ran¬ 
dom behavior over time. Our treatment of trends in economic time series focuses on 
stochastic trend. One of the simplest models of time series with stochastic trend is 
the one-dimensional random walk defined by the relation Yt = Yt-i + Ut, where 
Ut is the error term represented by a normally distributed random variable with zero 
mean and variance > 0. In this case, the best forecast of tomorrow’s value is 
its value today. A extension of the latter is the random walk with drift defined by 
Yt = pQ + Yt-i + Ut, /3o S M, where the best forecast is the value of the series 
today plus the drift Pq. A random walk is nonstationary because the variance of a 
random walk increases over time, so the distribution of Yt changes over time. In fact, 
since ut is uncorrelated with Yt-i, we have var(y() = vai{Yt-i) + var(ut) with 
variYt) = vaiiXt-i) if and only if var(Mt) = 0. The random walk is a particular 
case of an AR(1) model with Pi = 1. If \Pi\ < 1 and ut is stationary, then Yt is 
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stationary. The condition for the stationarity of an AR(p) model is that the roots of 
\ — Piz — 132Z^ — — ■■■ — PpzP = 0 are greater than one in absolute value. 

If an AR(p) has a root equal to one, then we say that the series has a unit root and 
a stochastic trend. Stochastic trends usually bring many issues, for example, the au¬ 
toregressive coefficients are biased toward zero. Because Yt is nonstationary, the as¬ 
sumptions for time-series regression do not hold, and we cannot rely on estimators 
and test statistics having their usual large-sample normal distributions; see, for exam¬ 
ple, [7, Chap. 3.2]. In fact, the OLS estimator of the autoregressive coefficient /3i is 
consistent, but it has a nonnormal distribution; then the asymptotic distribution of Pi 
is shifted toward zero. Another problem caused by stochastic trend is the nonnormal 
distribution of the t-statistic, which means that conventional confidence intervals are 
not valid and hypothesis tests cannot be conducted as usual. The t-statistic is an im¬ 
portant example of a test statistic, namely of a statistic used to perform a hypothesis 
test. A statistical hypothesis test can make two types of mistakes: a type I error, in 
which the null hypothesis is rejected when, in fact, it is true, and a type II error, in 
which the null hypothesis is not rejected when, in fact, it is false. The prespecified 
rejection probability of a statistical hypothesis test when the null hypothesis is true, 
that is, the prespecified probability of a type I error, is called the significance level of 
the test. The critical value of the test statistic is the value of the statistic for which the 
test just rejects the null hypothesis at the given significance level. The p-value is the 
probability of obtaining a test statistic, by random sampling variation, at least as ad¬ 
verse to the null hypothesis value as is the statistic actually observed, assuming that 
the null hypothesis is cotTect. Equivalently, the p-value is the smallest significance 
level at which you can reject the null hypothesis. The value of the t-statistic is 

^ estimator — hypothesized value 
standard error of the estimator 

and is well approximated by the standard normal distribution when n is large because 
of the central limit theorem (see, e.g., [1, Chap. 4.3]). Moreover, stochastic trends can 
lead two time series to appear related when they are not, a problem called spurious 
regression (see, e.g., [5, Chap. 2] for examples). For the AR(1) model, the most 
commonly used test to determine stochastic trends, is the Dickey-Fuller test (see, 
e.g., [5, Chap. 3] for details. For this test, we first subtract Yt-i from both sides of 
the equation Yt = Po + PiYt-i + Ut- Then we assume that the following hypothesis 
test holds: 

Hq : S = 0 versus Hi : S < 0 in Yj — Yt-i = AYt = Po + ^Yt-i + Ut 

with S = pi — 1. For an AR{p) model, it is standard to use the augmented Dickey- 
Fuller test (ADF), which tests the null hypothesis Ho : <5 = 0 against the one-side 
alternative Hi : 6 < 0 in the regression 

AYt = Po + SYt-i + yiAYt-i + 72AYt_2 H-h jpAYt-p + ut 

under the null hypothesis. Let us note that since Yt has a stochastic trend, it follows 
that, under the alternative hypothesis, Yt is stationary. The ADF statistic is the OLS 
t-statistic testing 5 = 0. If, instead, the alternative hypothesis is that Yt is station¬ 
ary around a deterministic linear time trend, then this trend t must be added as an 
additional regressor. In this case, the Dickey-Fuller regression becomes 
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AYt = /3o + at + SYt-i + ^lAYt-i + 'y2AYt-2 H-h ^pAYt-p + ut, 

and we test for (5 = 0. The ADF statistic does not have a normal distribution, and 
hence different critical values have to be used. 

2.4 Breaks 

A second type of nonstationarity arises when the regression function changes over the 
course of the sample. In economics, this can occur for a variety of reasons, such as 
changes in economic policy, changes in the structure of the economy, or an invention 
that changes a specihc industry. These breaks cannot be neglected by the regression 
model. A problem caused by breaks is that the OLS regression estimates over the 
full sample will estimate a relationship that holds “on average,” in the sense that the 
estimate combines two different periods, and this leads to poor forecast. There are 
two types of testing for breaks: testing for a break at a known date and for a break 
at an unknown break date. We consider the hrst option for an AR(p) model. Let r 
denote the hypothesized break date, and let Dt (t) be the binary variable such that 
Dt (r) = 0 if f > r and Dt (r) = 1 if f < r. Then the regression including the binary 
break indicator and all interaction terms reads as follows: 

Yt = PoY PiYt-i + P2Yt-2 + • • • + PpYt-p + 'YoDtir) 

+ 7i[A(r) X Yt-i] +72[A(t) X Yt- 2 ] H- \-jp[Dt{T) x Yt-p] + Ut 

under the null hypothesis of no breaks, 70 = 7 i = 72 = • • • = 7 p = 0. Under 
the alternative hypothesis that there is a break, the regression function is different 
before and after the break date t, and we can use the F-statistic performing the so- 
called the Chow test (see, e.g., [ 6 , Chap. 5.3.3]). If we suspect a break between two 
dates To and ti, the Chow test can be modihed to test for breaks at all possible dates r 
between tq and ri, then using the largest of the resulting F-statistics to test for a break 
at an unknown date. The latter technique is called the Quandt likelihood ratio statistic 
(QLR) (see, e.g., [7, Chap. 14.7]). Because the QLR statistic is the largest of many 
F-statistics, its distribution is not the same as that of an individual F-statistic; also, the 
critical values for the QLR statistic must be obtained from a special distribution. 

3 MA and ARMA 

In the following, we consider hnite-order moving-average (MA) processes (see, e.g., 
[ 6 , Chap. 2.2]). The moving-average process of order q, MA(q), is dehned by Yt — 

ao + — aiut-i — a 2 Ut -2 -— aqUt-q', equivalently, by using the lag operator 

we get Yt — ao = (1 — aiL — a 2 Lf — ■ ■ ■ — aqL‘^)ut. Every hnite MA(q) process is 
stationary, and we have 

• E[Yt] = ao, 

• = ^[(Xt — Q^o)^] = (1 + 0^1 + 0^2 + * * * + 

• Cov[y,,F,+,] =E[{Yt-ao){Yt+r-ao)] 

— — l * * * ^q'^t-\-T — q) 

(y.\Ut — \{ut-\-T — 1 * * * ^qi^t-\-T — q) 

OLq%Ll — q{v,i-\--Y — \ * * * OLqUi-i^-p—q)^. 
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Combining both an autoregressive (AR) term of order p and a moving-average (MA) 
term of order q, we can define the process denoted as ARMA(p, q) and represented 
by 

Vt = Po -\- PiYt-i -f • • • -|- PpYt-p + ut — aiUt-i — ■ ■ ■ — aqUt-q', 

again, exploiting the lag operator, we can write 

(1 - PiL - - PpLP)Yt =Po + {l- aiL - aaC"- aqL^)uu 

P{L)Yt = Po + a{L)ut. 


4 Vector autoregression 

In what follows, we focus our study on the so-called vector autoregression (VAR) 
econometric model, also using some remarks on the relation between the univariate 
time series models described in the hrst part, and the set of simultaneous equations 
systems of traditional econometrics characterizing the VAR approach (see, e.g., [4, 
Chap. 2]). 

4.1 Representation of the system 

We have so far considered forecasting a single variable. However, it is often necessary 
to allow for a multidimensional statistical analysis if we want to forecast more than 
one-parameter dynamics. This section introduces a model for forecasting multiple 
variables, namely the vector autoregression (VAR) model, in which lagged values 
of two or more variables are used to forecast their future values. We start with the 
autoregressive representation in a VAR model of order p, denoted by VAR(p), where 
each component depends on its own lagged values up to p periods and on the lagged 
values of all other variables up to order p. It follows that the main idea behind the 
VAR model is to know how new information, appearing at a certain time point and 
concerning one of the observed variables, is processed in the system and which impact 
it has over time not only for this particular variable but also for the other system 
parameters. Hence, a VAR(p) model is a set of k time-series regressions {k G N'*') in 
which the regressors are lagged values of all k series and the number of lags equals p 
for each equation. In the case of two time series variables, say, Yt and Xt, the VAR(p) 
consists of two equations of the form 

f ^ = Pio + PiiYt-i -f • • • -f PipYt-p + yiiXt-i -f • • • -f yipXt-p -\- uu, 

I Ait = P20 + P2iYt-i -f • • • -f P2pYt-p + j2iXt-i -f • • • -f 'y2pXt-p + U2t, 

( 6 ) 

where the /3s and the 7 s are unknown coefficients, and uu and U 2 t are error terms rep¬ 
resented by normally distributed random variables with zero mean and variance cr^ > 
0. The VAR assumptions are the same as those for the time-series regression dehn- 
ing AR models and applied to each equation; moreover, the coefficients of each VAR 
are estimated by means of the OLS approach. The reduced form of a vector autore¬ 
gression of orderp is defined as Zt = 6 + AiZt-i + A 2 Zt -2 -f • • • -f ApZt-p + Ut, 
where At, i = 1,... ,p, are fc-dimensional quadratic matrices, U represents the k- 
dimensional vector of residuals at time t, and 5 is the vector of constant terms. 
System ( 6 ) can be rewritten compactly as Ap{L)Zt = 5 -|- Ut, where Ap{L) = 
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h - AiL - A 2 L^ - ApLP, E[Ut] = 0, E[UtUl] = and E[UtU',] = 0 

for t ^ s. Such a system is stable if and only if all included variables are station¬ 
ary, that is, if all roots of the characteristic equation of the lag polynomial are out¬ 
side the unit circle, namely det(/fc — Aiz — A 2 Z — ■ ■ ■ — Apz) ^ 0 for \z\ < 1 
(for details, see, e.g., [6, Chap. 4.1]). We use this condition because we saw in Sec¬ 
tion 2.3 that the condition for the stationarity of an AR(p) model is that the roots of 
1 — I3iz — f32Z^ — — ... — Ppz’P = 0 are greater than one in absolute value. 

If an AR(p) has a root equal to one, we say that the series has a unit root and a 
stochastic trend. Moreover, the previous system can be rewritten by exploiting the 
MA representation as follows: 

Zt = A-\L)5 + A-\L)Ut 

= fi + Ut — BiUt-i — B2Ut-2 — B^Ut-s — ■ ■ ■ 

= ti + B{L)Ut 


with 


Bo=Ik, BiL):=I-Y,BjL^ =A-\L), 

i=i 

= A-^{1)S = B{l)d. 

The autocovariance matrices are defined as rz{T) = A[(Zt—p,)(Z(_T- — p)']; without 
loss of generality, we set 5 = 0 and, therefore, /i = 0, whence we obtain 

E[ZtZ't_,] = AiE[Zt-iZi_,] + A2E[Zt-2Z't_,] 

+ ■ • ■ + ApE[Zt-pZ^_.^'\ + E\UtZ'i._.^'\ 


and, for r > 0, 

Ezir) = AiEzir - 1) + A2rz(r - 2) + • • • + ApEzir - p), 

Ez{ 0 ) = AiEz{- 1 ) + A2Ez{-2) + • • • + ApEz{-p) + 

= AiEz( 1 ) +A2Ez{ 2) +---+ApEz{p) + E^u- 

Since the autocovariance matrix entries link a variable with both its delays and 
the remaining model variables, we have that if the autocovariance between X and Y 
is positive, then X tends to move accordingly with Y and vice versa, whereas if X 
and Y are independent, their autocovariance obviously equals zero. 

4.2 Determining lag lengths in VARs 

An appropriate method for the lag length selection of VAR is fundamental to deter¬ 
mine properties of VAR and related estimates. There are two main approaches used 
for selecting or testing lag length in VAR models. The hrst consists of rules of thumb 
based on the periodicity of the data and past experience, and the second is based on 
formal information criteria. VAR models typically include enough lags to capture the 
full cycle of the data; for monthly data, this means that there is a minimum of 12 lags, 
but we will also expect that there is some seasonality that is carried over from year 
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to year, so often lag lengths of 13-15 months are used (see, e.g., [4, Chap. 2.5]). For 
quarterly data, it is standard to use six lags. This captures the cyclical components in 
the year and any residual seasonal components in most cases. Usually, we decide to 
choose the number of lags not exceeding kp + \ <T, where k is the number of en¬ 
dogenous variables, p is the lag length, and T is the total number of observations. We 
use this limitation because the estimate of all these coefficients increases the amount 
of forecast estimation errors, which can result in a deterioration of the accuracy of 
the forecast itself. The lag length in VAR can be formally determined using informa¬ 
tion criteria; let be the estimate of the covariance matrix with the (z, j) element 
y where un is the OLS residual from the jth equation. The BIC for the 

kth equation in a VAR model is 

1 T 

BIC(p) = ln[det(i;„„)] -f k{kp + 1)^^, (7) 

whereas the AIC is computed using Eq. (7), modihed by replacing the term In T by 2. 
Among a set of candidate values of p, the estimated lag length p is the value of p that 
minimizes BIC(p). 

4.3 Multiperiod VAR forecast 

Iterated multivariate forecasts are computed using a VAR in much the same way as 
univariate forecasts are computed using an autoregression. The main new feature of a 
multivariate forecast is that the forecast of one variable depends on the forecast of all 
variables in the VAR. To compute multiperiod VAR forecasts h periods ahead, it is 
necessary to compute forecast of all variables for all intervening periods between T 
and T+h. Then the following scheme applies: compute the one-period-ahead forecast 
of all the variables in the VAR, then use those forecasts to compute the two-period- 
ahead forecasts, and repeat the previous stops until the desired forecast horizon. For 
example, the two-period-ahead forecast of Yt +2 based on the two-variable VAR(p) 
in Eq. ( 6 ) is 

^T+2\T = Ao + /5 i1^T-|-1|T + Pi2Yt + "!-••■ + +l3ipYT-p+2 

+ 711 -V 7 ’_|_i| 7 ’ -f 7 i2-Vt + 713 XT-I -I-+ flpXT-p+2^ ( 8 ) 

where the coefficients in ( 8 ) are the OLS estimates of the VAR coefficients. 

4.4 Granger causality 

An important question in multiple time series is to assign the value of individual 
variables to explain the remaining ones in the considered system of equations. An 
example is the value of a variable Yt for predicting another variable Xt in a dynamic 
system of equations or understanding if the variable Y) is informative about future 
values of Xt. The answer is based on the determination of the so-called Granger 
causality parameter for a time-series model (for details, see, e.g., [4, Chap. 2.5.4]). 
To dehne the concept precisely, consider the bivariate VAR model for two variables 
(Yt, Xt) as in Eq. ( 6 ). Using this system of equations. Granger causality states that, 
for linear models, Xt Granger causes Yt if the behavior of past Yt can better pre¬ 
dict the behavior of Xt than the past Xt alone. Eor the model in system ( 6 ), if Xt 
Granger causes Yt, then the coefficients for the past values of Xt in the Yt equation 
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are nonzero, that is, ^ 0 for i = 1,2,... ,p. Similarly, if Yt Granger causes Xt 
in the Xt equation, then the coefficients for the past values of Y* are nonzero, that is, 
/ 32 i ^ 0 for i = 1,2,... ,p. The formal testing for Granger causality is then done by 
using an F test for the joint hypothesis that the possible causal variable does not cause 
the other variable. We can specify the null hypothesis for the Granger causality test 
as follows. 


Hq: Granger noncausality Xt does not predict Yt if 
7ii = 712 = • • • = 7ip = 0. 

Hi: Granger causality Xt does predict Yt if 

711 0,712 7^ 0,..., oryip ^ 0, 


whereas the F test implementation is based on two models. 

Model 1 (unrestricted) 

Yt = /3io + /3iiYf_i + • • • + PipYt-p + 7 iiXt_i + • • • + ')ipXt-p + uu- 

Model 2 (restricted) 

Yt = Pio + /3iiYf_i + • • • + PipYt-p + uit. 

In the first model, we have yn 7 ^ 0 ,712 7 ^ 0,..., 7 ip 7 ^ 0, so the variable Xt 
compares in the equation of Yt, namely the values of Xt are useful to predict Yt. 
Instead, in the second model, 711 = 712 = • • • = 71 ^ = 0, so Xt does not Granger 
cause Yt. The test statistic has an F distribution with(p, T — 2p — 1) degrees of 
freedom: 



F{p,T-2p-l) 


If this F statistic is greater than the critical value for a chosen level of significance, we 
reject the null hypothesis that Xt has no effect on Yf and conclude that Xt Granger 
causes Yt. 

4.5 Cointegration 

In Section 2.3, we introduced the model of random walk with drift as follows: 


Yt — /^o + Y't_i + ut- 


(9) 


If Yt follows Eq. (9), then it has an autoregressive root that equals 1. If we consider a 
random walk for the first difference of the trend, then we obtain 


AFt = /3o + A^t-i + Ut. 


( 10 ) 


Hence, if Yt follows Eq.(lO), then AYt follows a random walk, and accordingly 
AYt — AYt_i is stationary; this is the second difference of Y) and is denoted A^Yt. 
A series that has a random walk trend is said to be integrated of order one, or 1(1); 
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Table 1. Critical values for the EG-ADF statistic 


Numbers of regressors 

10% 

5% 

1% 

1 

-3,12 

-3,41 

-3,96 

2 

-3,52 

-3,80 

-4,36 

3 

-3,84 

-4,16 

-4,73 

4 

-4,20 

-4,49 

-5,07 


a series that has a trend of the form (10) is said to be integrated of order two, or 
1(2); and a series that has no stochastic trend and is stationary is said to be inte¬ 
grated of order zero, or 1(0). The order of integration in the 1(1) and 1(2) termi¬ 
nology is the number of times that the series needs to be differenced for it to be 
stationary. If Yt is 1(2), then AYj is 1(1), so AYt has an autoregressive root that 
equals 1. If, however, Yt is 1(1), then AYt is stationary. Thus, the null hypothesis 
that Yt is 1(2) can be tested against the alternative hypothesis that Yt is 1(1) by testing 
whether AY) has a unit autoregressive root. Sometimes, two or more series have the 
same stochastic trend in common. In this special case, referred to as cointegration, 
regression analysis can reveal long-run relationships among time series variables. 
One could think that a linear combination of two processes 1(1) is a process 1(1). 
However, this is not always true. Two or more series that have a common stochas¬ 
tic trend are said to be cointegrated. Suppose that Xt and Yt are integrated of or¬ 
der one. If, for some coefficient 6, Yt — OXt is integrated of order zero, then Xt 
and Yt are said to be cointegrated, and the coefficient 9 is called the cointegrat¬ 
ing coefficient. If Xt and Yt are cointegrated, then they have a common stochastic 
trend that can be eliminated by computing the difference Yt — OXt, which elim¬ 
inates this common stochastic trend. There are three ways to decide whether two 
variables can be plausibly modeled exploiting the cointegration approach, namely, 
by expert knowledge and economic theory, by a qualitative (graphical) analysis of 
the series checking for common stochastic trend, and by performing statistical tests 
for cointegration. In particular, there is a cointegration test when 0 is unknown. Ini¬ 
tially, the cointegrating coefficient 0 is estimated by OLS estimation of the regres¬ 
sion 

Yt = a.9Xt-\-zt, ( 11 ) 

and then we use the Dickey-Fuller test (see Section 2.3) to test for a unit root in zt, 
this procedure is called the Engle-Granger augmented Dickey-Fuller test for coin¬ 
tegration (EG-ADF test); for details, see, for example, [6, Chap. 6.2] . The concepts 
covered so far can be extended to the case of more than two variables, for example, 
three variables, each of which is 1(1), are said to be cointegrated \fYt — 9iXit — 02 X 2 t 
is stationary. The Dickey-Fuller needs the use of different critical values (see Table 1 ), 
where the appropriate line depends on the number of regressors used in the first step 
of estimating the OLS cointegrating regression. 

A different estimator of the cointegrating coefficient is the dynamic OLS (DOES) 
estimator, which is based on the equation 

p 

Yt = Po (^Yt + SjXt-j -f Ut. 

3=-p 


(12) 
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In particular, from Eq. (12) we notice that DOES includes past, present, and future 
values of the changes in Xt- The DOES estimator of 9 is the OLS estimator of 9 in 
Eq. (12). The DOES estimator is efficient, and statistical inferences about 9 and 6s in 
Eq. (12) are valid. If we have cointegration in more than two variables, for example, 
three variable Yt, Xu, X 2 t, each of which is 1(1), then they are cointegrated with 
cointegrating coefficients 9i and 02 if Yt — ^iXu — is stationary. The EG-ADE 
procedure to test for a single cointegrating relationship among multiple variables is 
the same as for the case of two variables, except that the regression in Eq. (11) is 
modified so that both Xu and X 2 t are regressors. The DOES estimator of a single 
cointegrating relationship among multiple 2fs involves the level of each X along with 
lags of the first difference of each X. 

5 Conclusion 

In this first part of our ambitious project to use multivariate statistical techniques to 
study critic econometric data of one of the most influential economy in Italy, namely 
the Verona import-export time series, we have focused ourselves on a self-contained 
introduction to techniques of estimating OLS-type regressions, analysis of the cor¬ 
relations obtained between the different variables and various types of information 
criteria to check for the goodness of fit. A particular relevance has been devoted to 
the application of tests able to enlightening various types of nonstationarity for the 
considered time series, for example, the augmented Dickey-Fuller test (ADE) and 
the Quandt likelihood ratio statistic (QLR). Moreover, we have also exploited both 
the Granger causality test and the Engle-Granger augmented Dickey-Fuller test for 
cointegration (EG-ADE) in order to analyze if and how these variables are related 
to each other and to have a measure on how much a variable gives information on 
the other one. Such approaches constitute the core of the second part of our project, 
namely the aforementioned Verona case study. 
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